Overview

Dataset statistics

Number of variables16
Number of observations4238
Missing cells645
Missing cells (%)1.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory529.9 KiB
Average record size in memory128.0 B

Variable types

Categorical8
Numeric8

Warnings

currentSmoker is highly correlated with cigsPerDayHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
prevalentHyp is highly correlated with sysBP and 1 other fieldsHigh correlation
diabetes is highly correlated with glucoseHigh correlation
sysBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
diaBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
glucose is highly correlated with diabetesHigh correlation
currentSmoker is highly correlated with cigsPerDayHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
prevalentHyp is highly correlated with sysBP and 1 other fieldsHigh correlation
sysBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
diaBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
currentSmoker is highly correlated with cigsPerDayHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
prevalentHyp is highly correlated with sysBP and 1 other fieldsHigh correlation
sysBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
diaBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
sysBP is highly correlated with diaBP and 1 other fieldsHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
diaBP is highly correlated with sysBP and 1 other fieldsHigh correlation
glucose is highly correlated with diabetesHigh correlation
prevalentHyp is highly correlated with sysBP and 1 other fieldsHigh correlation
currentSmoker is highly correlated with cigsPerDayHigh correlation
diabetes is highly correlated with glucoseHigh correlation
education has 105 (2.5%) missing values Missing
BPMeds has 53 (1.3%) missing values Missing
totChol has 50 (1.2%) missing values Missing
glucose has 388 (9.2%) missing values Missing
cigsPerDay has 2144 (50.6%) zeros Zeros

Reproduction

Analysis started2021-08-24 15:41:13.758927
Analysis finished2021-08-24 15:41:25.728140
Duration11.97 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

male
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
2419 
1
1819 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Length

2021-08-24T17:41:25.906175image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-24T17:41:25.962523image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Most occurring characters

ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02419
57.1%
11819
42.9%

age
Real number (ℝ≥0)

Distinct39
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.58494573
Minimum32
Maximum70
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-08-24T17:41:26.044468image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum32
5-th percentile37
Q142
median49
Q356
95-th percentile64
Maximum70
Range38
Interquartile range (IQR)14

Descriptive statistics

Standard deviation8.572159925
Coefficient of variation (CV)0.1728782758
Kurtosis-0.9896358464
Mean49.58494573
Median Absolute Deviation (MAD)7
Skewness0.2281457773
Sum210141
Variance73.48192578
MonotonicityNot monotonic
2021-08-24T17:41:26.160079image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
40191
 
4.5%
46182
 
4.3%
42180
 
4.2%
41174
 
4.1%
48173
 
4.1%
39169
 
4.0%
44166
 
3.9%
45162
 
3.8%
43159
 
3.8%
52149
 
3.5%
Other values (29)2533
59.8%
ValueCountFrequency (%)
321
 
< 0.1%
335
 
0.1%
3418
 
0.4%
3542
 
1.0%
3684
2.0%
3792
2.2%
38144
3.4%
39169
4.0%
40191
4.5%
41174
4.1%
ValueCountFrequency (%)
702
 
< 0.1%
697
 
0.2%
6818
 
0.4%
6745
1.1%
6638
 
0.9%
6557
1.3%
6493
2.2%
63110
2.6%
6299
2.3%
61110
2.6%

education
Categorical

MISSING

Distinct4
Distinct (%)0.1%
Missing105
Missing (%)2.5%
Memory size33.2 KiB
1.0
1720 
2.0
1253 
3.0
687 
4.0
473 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12399
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4.0
2nd row2.0
3rd row1.0
4th row3.0
5th row3.0

Common Values

ValueCountFrequency (%)
1.01720
40.6%
2.01253
29.6%
3.0687
 
16.2%
4.0473
 
11.2%
(Missing)105
 
2.5%

Length

2021-08-24T17:41:26.357577image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-24T17:41:26.417269image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
1.01720
41.6%
2.01253
30.3%
3.0687
 
16.6%
4.0473
 
11.4%

Most occurring characters

ValueCountFrequency (%)
.4133
33.3%
04133
33.3%
11720
13.9%
21253
 
10.1%
3687
 
5.5%
4473
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8266
66.7%
Other Punctuation4133
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04133
50.0%
11720
20.8%
21253
 
15.2%
3687
 
8.3%
4473
 
5.7%
Other Punctuation
ValueCountFrequency (%)
.4133
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12399
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.4133
33.3%
04133
33.3%
11720
13.9%
21253
 
10.1%
3687
 
5.5%
4473
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII12399
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.4133
33.3%
04133
33.3%
11720
13.9%
21253
 
10.1%
3687
 
5.5%
4473
 
3.8%

currentSmoker
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
2144 
1
2094 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Length

2021-08-24T17:41:26.576775image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-24T17:41:26.634142image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Most occurring characters

ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02144
50.6%
12094
49.4%

cigsPerDay
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct33
Distinct (%)0.8%
Missing29
Missing (%)0.7%
Infinite0
Infinite (%)0.0%
Mean9.00308862
Minimum0
Maximum70
Zeros2144
Zeros (%)50.6%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-08-24T17:41:26.713073image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile30
Maximum70
Range70
Interquartile range (IQR)20

Descriptive statistics

Standard deviation11.92009359
Coefficient of variation (CV)1.324000473
Kurtosis1.023355805
Mean9.00308862
Median Absolute Deviation (MAD)0
Skewness1.247909903
Sum37894
Variance142.0886311
MonotonicityNot monotonic
2021-08-24T17:41:26.819560image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
02144
50.6%
20734
 
17.3%
30217
 
5.1%
15210
 
5.0%
10143
 
3.4%
9130
 
3.1%
5121
 
2.9%
3100
 
2.4%
4080
 
1.9%
167
 
1.6%
Other values (23)263
 
6.2%
ValueCountFrequency (%)
02144
50.6%
167
 
1.6%
218
 
0.4%
3100
 
2.4%
49
 
0.2%
5121
 
2.9%
618
 
0.4%
712
 
0.3%
811
 
0.3%
9130
 
3.1%
ValueCountFrequency (%)
701
 
< 0.1%
6011
 
0.3%
506
 
0.1%
453
 
0.1%
4356
 
1.3%
4080
 
1.9%
381
 
< 0.1%
3522
 
0.5%
30217
5.1%
291
 
< 0.1%

BPMeds
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing53
Missing (%)1.3%
Memory size33.2 KiB
0.0
4061 
1.0
 
124

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12555
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.04061
95.8%
1.0124
 
2.9%
(Missing)53
 
1.3%

Length

2021-08-24T17:41:27.009522image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-24T17:41:27.066022image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
0.04061
97.0%
1.0124
 
3.0%

Most occurring characters

ValueCountFrequency (%)
08246
65.7%
.4185
33.3%
1124
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8370
66.7%
Other Punctuation4185
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
08246
98.5%
1124
 
1.5%
Other Punctuation
ValueCountFrequency (%)
.4185
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12555
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
08246
65.7%
.4185
33.3%
1124
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII12555
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
08246
65.7%
.4185
33.3%
1124
 
1.0%

prevalentStroke
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
4213 
1
 
25

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Length

2021-08-24T17:41:27.418749image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-24T17:41:27.475400image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Most occurring characters

ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

prevalentHyp
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
2922 
1
1316 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Length

2021-08-24T17:41:27.645820image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-24T17:41:27.706310image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Most occurring characters

ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02922
68.9%
11316
31.1%

diabetes
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
4129 
1
 
109

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Length

2021-08-24T17:41:27.873900image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-24T17:41:27.930141image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Most occurring characters

ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

totChol
Real number (ℝ≥0)

MISSING

Distinct248
Distinct (%)5.9%
Missing50
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean236.7215855
Minimum107
Maximum696
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-08-24T17:41:28.014238image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum107
5-th percentile170
Q1206
median234
Q3263
95-th percentile312
Maximum696
Range589
Interquartile range (IQR)57

Descriptive statistics

Standard deviation44.59033432
Coefficient of variation (CV)0.1883661527
Kurtosis4.131581824
Mean236.7215855
Median Absolute Deviation (MAD)29
Skewness0.8714220097
Sum991390
Variance1988.297915
MonotonicityNot monotonic
2021-08-24T17:41:28.131183image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24085
 
2.0%
22070
 
1.7%
26062
 
1.5%
21061
 
1.4%
23259
 
1.4%
25057
 
1.3%
20056
 
1.3%
22554
 
1.3%
23054
 
1.3%
20553
 
1.3%
Other values (238)3577
84.4%
(Missing)50
 
1.2%
ValueCountFrequency (%)
1071
< 0.1%
1131
< 0.1%
1191
< 0.1%
1241
< 0.1%
1261
< 0.1%
1291
< 0.1%
1331
< 0.1%
1352
< 0.1%
1371
< 0.1%
1402
< 0.1%
ValueCountFrequency (%)
6961
 
< 0.1%
6001
 
< 0.1%
4641
 
< 0.1%
4531
 
< 0.1%
4391
 
< 0.1%
4321
 
< 0.1%
4103
0.1%
4051
 
< 0.1%
3981
 
< 0.1%
3921
 
< 0.1%

sysBP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct234
Distinct (%)5.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean132.3524068
Minimum83.5
Maximum295
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-08-24T17:41:28.253692image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum83.5
5-th percentile104
Q1117
median128
Q3144
95-th percentile175
Maximum295
Range211.5
Interquartile range (IQR)27

Descriptive statistics

Standard deviation22.03809664
Coefficient of variation (CV)0.1665107358
Kurtosis2.155019383
Mean132.3524068
Median Absolute Deviation (MAD)13
Skewness1.145362136
Sum560909.5
Variance485.6777037
MonotonicityNot monotonic
2021-08-24T17:41:28.368153image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
120107
 
2.5%
130102
 
2.4%
11096
 
2.3%
11589
 
2.1%
12588
 
2.1%
12484
 
2.0%
12280
 
1.9%
12673
 
1.7%
12873
 
1.7%
12372
 
1.7%
Other values (224)3374
79.6%
ValueCountFrequency (%)
83.52
 
< 0.1%
851
 
< 0.1%
85.51
 
< 0.1%
902
 
< 0.1%
921
 
< 0.1%
92.52
 
< 0.1%
932
 
< 0.1%
93.52
 
< 0.1%
943
0.1%
957
0.2%
ValueCountFrequency (%)
2951
 
< 0.1%
2481
 
< 0.1%
2441
 
< 0.1%
2431
 
< 0.1%
2351
 
< 0.1%
2321
 
< 0.1%
2301
 
< 0.1%
2202
< 0.1%
2171
 
< 0.1%
2153
0.1%

diaBP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct146
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean82.8934639
Minimum48
Maximum142.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-08-24T17:41:28.479430image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum48
5-th percentile66
Q175
median82
Q389.875
95-th percentile104.575
Maximum142.5
Range94.5
Interquartile range (IQR)14.875

Descriptive statistics

Standard deviation11.9108496
Coefficient of variation (CV)0.1436886461
Kurtosis1.277099606
Mean82.8934639
Median Absolute Deviation (MAD)7.5
Skewness0.714102184
Sum351302.5
Variance141.8683382
MonotonicityNot monotonic
2021-08-24T17:41:28.600167image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
80262
 
6.2%
82152
 
3.6%
85137
 
3.2%
70135
 
3.2%
81131
 
3.1%
84122
 
2.9%
90119
 
2.8%
78116
 
2.7%
87113
 
2.7%
75108
 
2.5%
Other values (136)2843
67.1%
ValueCountFrequency (%)
481
 
< 0.1%
501
 
< 0.1%
511
 
< 0.1%
522
 
< 0.1%
531
 
< 0.1%
541
 
< 0.1%
553
0.1%
562
 
< 0.1%
576
0.1%
57.53
0.1%
ValueCountFrequency (%)
142.51
 
< 0.1%
1401
 
< 0.1%
1362
 
< 0.1%
1352
 
< 0.1%
1332
 
< 0.1%
1321
 
< 0.1%
1305
0.1%
1291
 
< 0.1%
1281
 
< 0.1%
127.51
 
< 0.1%

BMI
Real number (ℝ≥0)

Distinct1363
Distinct (%)32.3%
Missing19
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean25.80200758
Minimum15.54
Maximum56.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-08-24T17:41:28.717790image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum15.54
5-th percentile20.06
Q123.07
median25.4
Q328.04
95-th percentile32.782
Maximum56.8
Range41.26
Interquartile range (IQR)4.97

Descriptive statistics

Standard deviation4.080111062
Coefficient of variation (CV)0.1581315349
Kurtosis2.656838673
Mean25.80200758
Median Absolute Deviation (MAD)2.49
Skewness0.9819743064
Sum108858.67
Variance16.64730628
MonotonicityNot monotonic
2021-08-24T17:41:28.836385image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
23.4818
 
0.4%
22.9118
 
0.4%
22.5418
 
0.4%
22.1918
 
0.4%
23.0916
 
0.4%
25.0916
 
0.4%
22.7313
 
0.3%
23.113
 
0.3%
25.2313
 
0.3%
23.6812
 
0.3%
Other values (1353)4064
95.9%
(Missing)19
 
0.4%
ValueCountFrequency (%)
15.541
< 0.1%
15.961
< 0.1%
16.481
< 0.1%
16.592
< 0.1%
16.611
< 0.1%
16.691
< 0.1%
16.711
< 0.1%
16.731
< 0.1%
16.751
< 0.1%
16.871
< 0.1%
ValueCountFrequency (%)
56.81
< 0.1%
51.281
< 0.1%
45.81
< 0.1%
45.791
< 0.1%
44.711
< 0.1%
44.551
< 0.1%
44.271
< 0.1%
44.091
< 0.1%
43.691
< 0.1%
43.671
< 0.1%

heartRate
Real number (ℝ≥0)

Distinct73
Distinct (%)1.7%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean75.87892377
Minimum44
Maximum143
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-08-24T17:41:28.952891image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile60
Q168
median75
Q383
95-th percentile98
Maximum143
Range99
Interquartile range (IQR)15

Descriptive statistics

Standard deviation12.02659635
Coefficient of variation (CV)0.158497192
Kurtosis0.9074832435
Mean75.87892377
Median Absolute Deviation (MAD)7
Skewness0.6444817335
Sum321499
Variance144.6390198
MonotonicityNot monotonic
2021-08-24T17:41:29.079180image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
75563
 
13.3%
80385
 
9.1%
70305
 
7.2%
60231
 
5.5%
85227
 
5.4%
72222
 
5.2%
65197
 
4.6%
90172
 
4.1%
68151
 
3.6%
10098
 
2.3%
Other values (63)1686
39.8%
ValueCountFrequency (%)
441
 
< 0.1%
452
 
< 0.1%
461
 
< 0.1%
471
 
< 0.1%
485
 
0.1%
5022
0.5%
511
 
< 0.1%
5217
0.4%
5311
0.3%
5412
0.3%
ValueCountFrequency (%)
1431
 
< 0.1%
1401
 
< 0.1%
1301
 
< 0.1%
1253
 
0.1%
1222
 
< 0.1%
1207
 
0.2%
1155
 
0.1%
1123
 
0.1%
11036
0.8%
1088
 
0.2%

glucose
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct143
Distinct (%)3.7%
Missing388
Missing (%)9.2%
Infinite0
Infinite (%)0.0%
Mean81.96675325
Minimum40
Maximum394
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-08-24T17:41:29.198237image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile62
Q171
median78
Q387
95-th percentile108.55
Maximum394
Range354
Interquartile range (IQR)16

Descriptive statistics

Standard deviation23.95999819
Coefficient of variation (CV)0.2923136179
Kurtosis58.67427779
Mean81.96675325
Median Absolute Deviation (MAD)8
Skewness6.213401854
Sum315572
Variance574.0815132
MonotonicityNot monotonic
2021-08-24T17:41:29.317451image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
75193
 
4.6%
77167
 
3.9%
73156
 
3.7%
80152
 
3.6%
70152
 
3.6%
83151
 
3.6%
78148
 
3.5%
74141
 
3.3%
76127
 
3.0%
85127
 
3.0%
Other values (133)2336
55.1%
(Missing)388
 
9.2%
ValueCountFrequency (%)
402
 
< 0.1%
431
 
< 0.1%
442
 
< 0.1%
454
0.1%
473
0.1%
481
 
< 0.1%
503
0.1%
522
 
< 0.1%
535
0.1%
545
0.1%
ValueCountFrequency (%)
3942
< 0.1%
3861
< 0.1%
3701
< 0.1%
3681
< 0.1%
3481
< 0.1%
3321
< 0.1%
3251
< 0.1%
3201
< 0.1%
2971
< 0.1%
2941
< 0.1%

TenYearCHD
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
3594 
1
644 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Length

2021-08-24T17:41:29.505781image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-24T17:41:29.579911image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Most occurring characters

ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Interactions

2021-08-24T17:41:18.035894image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:18.168956image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:18.286811image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:18.434814image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:18.538202image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:18.641111image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:18.759849image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:18.877115image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:19.123496image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:19.247195image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:19.362133image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:19.479270image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:19.590686image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:19.706216image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:19.800921image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:19.903212image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:20.027307image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:20.140591image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:20.259578image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:20.390926image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:20.526980image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:20.652036image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:20.761054image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:20.876867image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:20.998916image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:21.109695image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:21.223333image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:21.340545image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:21.446042image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:21.540334image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:21.738001image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:21.835506image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:21.930802image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:22.027313image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:22.123410image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:22.220678image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:22.315078image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:22.408660image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:22.503224image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:22.601780image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:22.708460image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:22.807603image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:22.908099image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:22.995634image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:23.076520image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:23.162600image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:23.245901image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:23.329882image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:23.417404image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:23.508630image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:23.598950image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:23.690251image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:23.776237image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:23.861344image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:23.944200image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:24.031110image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:24.120989image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:24.220089image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:24.448458image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:24.549740image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:24.640224image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:24.734475image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:24.821928image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-24T17:41:24.913116image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-08-24T17:41:29.671820image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-08-24T17:41:29.853064image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-08-24T17:41:30.019505image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-08-24T17:41:30.190263image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-08-24T17:41:30.599548image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-08-24T17:41:25.132292image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-08-24T17:41:25.309089image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-08-24T17:41:25.534065image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-08-24T17:41:25.657945image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

maleageeducationcurrentSmokercigsPerDayBPMedsprevalentStrokeprevalentHypdiabetestotCholsysBPdiaBPBMIheartRateglucoseTenYearCHD
01394.000.00.0000195.0106.070.026.9780.077.00
10462.000.00.0000250.0121.081.028.7395.076.00
21481.0120.00.0000245.0127.580.025.3475.070.00
30613.0130.00.0010225.0150.095.028.5865.0103.01
40463.0123.00.0000285.0130.084.023.1085.085.00
50432.000.00.0010228.0180.0110.030.3077.099.00
60631.000.00.0000205.0138.071.033.1160.085.01
70452.0120.00.0000313.0100.071.021.6879.078.00
81521.000.00.0010260.0141.589.026.3676.079.00
91431.0130.00.0010225.0162.0107.023.6193.088.00

Last rows

maleageeducationcurrentSmokercigsPerDayBPMedsprevalentStrokeprevalentHypdiabetestotCholsysBPdiaBPBMIheartRateglucoseTenYearCHD
42280501.000.00.0011260.0190.0130.043.6785.0260.00
42290513.0120.00.0010251.0140.080.025.6075.0NaN0
42300561.013.00.0010268.0170.0102.022.8957.0NaN0
42311583.000.00.0010187.0141.081.024.9680.081.00
42321681.000.00.0010176.0168.097.023.1460.079.01
42331501.011.00.0010313.0179.092.025.9766.086.01
42341513.0143.00.0000207.0126.580.019.7165.068.00
42350482.0120.0NaN000248.0131.072.022.0084.086.00
42360441.0115.00.0000210.0126.587.019.1686.0NaN0
42370522.000.00.0000269.0133.583.021.4780.0107.00